21 research outputs found
Neural Expectation Maximization
Many real world tasks such as reasoning and physical interaction require
identification and manipulation of conceptual entities. A first step towards
solving these tasks is the automated discovery of distributed symbol-like
representations. In this paper, we explicitly formalize this problem as
inference in a spatial mixture model where each component is parametrized by a
neural network. Based on the Expectation Maximization framework we then derive
a differentiable clustering method that simultaneously learns how to group and
represent individual entities. We evaluate our method on the (sequential)
perceptual grouping task and find that it is able to accurately recover the
constituent objects. We demonstrate that the learned representations are useful
for next-step prediction.Comment: Accepted to NIPS 201
Hierarchical Relational Inference
Common-sense physical reasoning in the real world requires learning about the
interactions of objects and their dynamics. The notion of an abstract object,
however, encompasses a wide variety of physical objects that differ greatly in
terms of the complex behaviors they support. To address this, we propose a
novel approach to physical reasoning that models objects as hierarchies of
parts that may locally behave separately, but also act more globally as a
single whole. Unlike prior approaches, our method learns in an unsupervised
fashion directly from raw visual images to discover objects, parts, and their
relations. It explicitly distinguishes multiple levels of abstraction and
improves over a strong baseline at modeling synthetic and real-world videos.Comment: Accepted to AAAI 202
The Impact of Depth and Width on Transformer Language Model Generalization
To process novel sentences, language models (LMs) must generalize
compositionally -- combine familiar elements in new ways. What aspects of a
model's structure promote compositional generalization? Focusing on
transformers, we test the hypothesis, motivated by recent theoretical and
empirical work, that transformers generalize more compositionally when they are
deeper (have more layers). Because simply adding layers increases the total
number of parameters, confounding depth and size, we construct three classes of
models which trade off depth for width such that the total number of parameters
is kept constant (41M, 134M and 374M parameters). We pretrain all models as LMs
and fine-tune them on tasks that test for compositional generalization. We
report three main conclusions: (1) after fine-tuning, deeper models generalize
better out-of-distribution than shallower models do, but the relative benefit
of additional layers diminishes rapidly; (2) within each family, deeper models
show better language modeling performance, but returns are similarly
diminishing; (3) the benefits of depth for compositional generalization cannot
be attributed solely to better performance on language modeling or on
in-distribution data